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Abstract. We present here new mechanisms for hashing data via binary 
embeddings. Contrary to most of the techniques presented before, the 
embedding matrix of our mechanism is highly structured. That enables us 
to perform hashing more efficiently and use less memory. What is crucial 
and nonintuitive is the fact that imposing structured mechanism does not 
affect the quality of the produced hash. To the best of our knowledge, we 
are the first to give strong theoretical guarantees of the proposed binary 
hashing method by proving the efficiency of the mechanism for several 
classes of structured projection matrices. As a corollary, we obtain binary 
hashing mechanisms with strong concentration results for circulant and 
Topelitz matrices. Our approach is however much more general. 


1 Hashing mechanism 

In this section we explain in detail proposed hashing mechanism for initial di¬ 
mensionality reduction that is used to preprocess data before it is given as an 
input to the autoencoder. As mentioned earlier, the mechanism is of its own 
interest. We introduce first the aforementioned family of '/'-regular matrices V 
that is a key ingredient of the method. 

Assume that k is the size of the hash and n is the dimensionality of the 
data. Let t be the size of the pool of independent random gaussian variables 
{gi, ... ,g t }, where each gi ~ Af(0, 1). Assume that k < n < t < kn. We say that 
a random matrix V is '/'-regular if V is of the form: 


^EzeSi.i 9i • 

• EzgSij 9i 

• EzeSi,„ 



i 9i • 

■ Yhi£Si tj 9i 


9i 

(1) 

\Ez G 5 fcil 9i • 

■ 9i • 


9i) 



where Sij C {l,...,i} for i £ {l,...,fc}, j £ {l,...,n}, = ... = |5» in | 

for i = 1,..., k, Sij (~1 Si tU = 0 for i £ {1,..., k}, {j,u} C {1,..., n}, j ^ u and 
furthermore the following holds: 

— for every column of V every gi appears in at most </> entries from that column. 

Notice that all structured matrices that we mentioned in the abstract are 
special cases of the 0-regular matrix. Indeed, each Toeplitz matrix is clearly 
0-regular, where subsets Sij are singletons. 
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Let 4> be a function satisfying linx^oo <f>(x) = 1 and linx r _ ) ._ 00 <f>(x) = — 1. We 
will consider two hashing methods. The first one, called by us extended &-regular 
hashing, applies first random diagonal matrix 1Z to the datapoint x, then the 
/^-normalized Hadamard matrix hi, next another random diagonal matrix T>, 
then the tf'-regular projection matrix Vy and finally function <f> (the latter one 
applied pointwise). The overal scheme is presented below: 

x —>• xu —>• x-u —>• xv - > Xu# —> n(x) G R . (2) 

The diagonal entries of matrices 1Z and T> are chosen independently from the 
binary set {—1,1}, each value being chosen with probability We also propose 
a shorter pipeline, called by us short \P-regular hashing, where we avoid applying 
first random matrix and Hadamard matrix 1Z and the Hadamard matrix, i.e. the 
overall pipeline is of the form: 

x Xu -^4 xu* A h(x) € (3) 

The goal is to compute good approximation of the angular distance between 
given /^-normalized vectors p, r, given their compact hashed versions: h{p), h(r). 
To achieve this goal we consider the Li-distance in the /c-dimensional space 
of hashes. Let 0 p>r denote the angle between vectors p and r. We define the 
normalized approximate angle between p and r as: 

o;, r = ^\\Hp)-h(v)\\ 1 (4) 

In the next section we will show that the normalized approximate angle between 
vectors p and r is a very precise estimation of the actual angle if the chosen 
parameter 'P is not large enough. Furthermore, we show an intriguing connection 
between theoretical guarantess regarding the quality of the produced hash and 
the chromatic number of some specific undirected graph encoding the structure 
of V. For many of the structured matrices under consideration this graph is 
induced by an algebraic group operation defining the structure of V (for istance, 
for the circular matrix the group is a single shift and the underlying graph is a 
collection of pairwise disjoint cycles and trees thus its chromatic number is at 
most 3). 

2 Theoretical results 

2.1 Introduction 

We are ready to provide theoretical guarantees regarding the quality of the 
produced hash. Our guarantees will be given for a sign function, i.e for (j) defined 
as: <j)(x) = 1 for x > 0, 4>(x) = —1 for x < 0. However we should emphasize that 
empirical results showed that other functions (that are often used as nonlinear 
maps in deep neural networks) such as sigmoid function, also work well. It is 
not hard to show that 0 p>r is an unbiased estimator of -jf 1 , i.e. E(0p) = -jj 1 . 
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What we will focus on is the concentration of the random variable around 
its mean -jf~- We will prove strong exponential concentration results regarding 
the extended 'f'-regular hashing method. Interestingly, the application of the 
Hadamard mechanism is not necessary and it is possible to get concentration 
results, yet weaker than in the former case, also for short 'P'-regular hashing. As 
a warm up, let us prove the following. 

Lemma 1. Let M. be a -regular hashing model (either extended or short). 
Then 9 pr is an unbiased estimator of 9 Ptr , i.e. 


E(k r ) 



Proof. Notice first that the ith row, call it g\ of the matrix V is a n-dimensional 
gaussian vector with mean 0 and where each element has standard deviation cr 7 ; 
for <jj = |<Sj.i| = ... = |<Sj,n| (i = 1, ...,k). Thus, after applying matrix V the new 
vector gtp is still gaussian and of the same distribution. Let us consider first the 
short 'T-regular hashing model. Fix some /^-normalized vectors p, r (without 
loss of generality we may assume that they are not collinear) and denote by H pr 
the 2-dimensional hyperplane spanned by {p, r}. Denote by gif H the projection 
of gif into H and by gif H j_ the line in H perpendicular to gif H . Let (f> be a sign 
function. Notice that the contribution to the Li-sum \\h(p) — /i(r)||i comes from 
those g l for which gif H ^ divides an angel between p and r, i.e. from those g l 
for which gif H is inside the union IA V)T of two 2-dinrensional cones bounded by 
two lines in H perpendicular to p and r respectively. Observe that, from what 
we have just said, we can conclude that 9 pr = Al+ ' fc ' +A|f , where: 

X t = { 1 Up ’ r ' (5) 

( 0 otherwise. 


Now it suffices to notice that vector g l v H is a gaussian random variable and 
thus its direction is uniformly distributed over all directions. Thus each X, is 
nonzero with probability exactly and the theorem follows. For the extended 
tf'-regular hashing model the analysis is very similar. The only difference is that 
data is preprocessed by applying TLTZ linear mapping first. Both TL and 1Z are 
matrices of rotations though, thus their product is also a rotation matrix. Since 
rotations do not change angular distance, the former analysis can be applied 
again and yields the proof. 


2.2 The "P-chromatic number 

As we have already mentioned, the highly well organized structure of the pro¬ 
jection matrix V gives rise to the underlying undirected graph that encodes 
dependencies between different entries of V. More formally, let us fix two rows 
of V of indices 1 < fci < < k. We define a graph Qv{k\,k 2 ) as follows: 

- V(G v (k 1 ,k 2 )) = {{ji,J 2 } : 3/ e {l,...,t}s.t.gi G S klJl nS k2 ,j 2 ,ji 7 ^) 2 }, 
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— there exists an edge between vertices {ji,J 2 } and {j 3 , J 4 } iff {Ji, J 2 }n{j 3 , J 4 } ^ 
0 . 

The chromatic number y(C7) of the graph Q is the minimal number of colors 
that can be used to color the vertices of the graph in such a way that no two 
adjacent vertices have the same color. 

Definition 1. LetV be a T-regular matrix. We define the V-chromatic number 
X(V) as: 

X(P) = max x{Q{k u k 2 ))- 

l<ki <K2<k 


2.3 Concentration inequalities for structured hashing with sign 
function 


We present now our main theoretical results. Let us consider first the extended 
^-regular hashing model. The following is true. 


Theorem 1. Take the extended T-regular hashing model M. with t independent 
gaussian random variables: gi,...,g t , each of distribution Af (0,1). Let N be the 
size of the dataset. Denote by k the size of the hash and by n the dimensionality 
of the data. Let f(n) be arbitrary positive function. Let p , r be two fixed vectors 
p,r £ R ra with angular distance 6 p ^ r between them. Then for every a, e > 0 the 
following is true: 


P,T 


LI 


< e) > (1 - 4 


DM 


- 4x(P) 


b\ 2 a 2 t 

e 7*w)(l-yl), 


where A=± -^(fe)V(l - v) k ~ 3 + 2e"^ and g = ■ 

Notice how the upper bound on the probability of failure P e depends on 
the 'P-chromatic number. The theorem above guarantees strong concentration 
of 0p r around its mean and therefore justifies theoretically the effectiveness of 
the structured hashing method. It becomes more clearly below. 

As a corollary, we obtain the following result: 


Theorem 2. Take the extended T-regular hashing model M with. Assume that 
the projection matrix V is Toeplitz. Let N be the size of the dataset. Denote by 
k the size of the hash and by n the dimensionality of the data. Let f(n) be an 
arbitrary positive function. Let p , r be two vectors p,r £ R” with angular distance 
6 Pir between them. Then for every e > 0 the following is true: 


) n 

p, r 


■^\<k- L *)>(l-0(^)~0(k 2 e n{ 


N 2 

~4~b > 


n 



Theorem [5] follows from Theorem [T] by taking: a = n 3 , e = k i, f(n) = 
3-\/log(n) and noticing that every Toeplitz matrix is 0-regular and the corre¬ 
sponding P-chromatic number \(P) is at most 3. 

Let us switch now to the short !P-regular hashing model. The theorem pre¬ 
sented below is the application of the Chebyshev’s inequality preceded by the 
careful analysis of the variance Var(6p r ). 
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Theorem 3. Take the short E-regular hashing model A4, where V is a Toeplitz 
matrix. Let N he the size of the dataset. Denote by k the size of the hash and by 
n the dimensionality of the data. Let p, r be two vectors p,r £ M n with angular 
distance 9 p ^ r between them. Then the following is true for any c > 0: 


The proofs of Theorem [T] and Theorem [3] will be given in the Appendix. 


3 Appendix 

In this section we prove Theorem [1] and Theorem [3] We will use notation from 
Lemma [T| 


3.1 Proof of Theorem Q] 

We start with the following technical lemma: 


Lemma 2. Let {Z \,..., Zk} be the set of k independent random variables defined 
on D such that each Zi has the same distribution and Zi £ {0,1}. Let {Fi,F&} 
be the set of events, where each F^ is in the a-field defined by Zi (in particular 
Ti does not depend on the afield a(Z±, ..., Zj_i, Zj+ 1 , ...Zk) ). Assume that there 
exists p < o such that: P(Fj) < p for i = 1 ,...,k. Let {U\, ..., Uk} be the set 
of k random variables such that Ui £ {0,1} and Ui\J r i = Z^Tt for i = 1, k, 
where X\~F stands for the random variable X truncated to the event F. Assume 
furthermore that E(Ui) = E{Zf) for i = 1 Denote Y = ii+^i±21 Then 

the following is true. 


n\Y -EY\>a)<±J2 - ^~ r + 2e 


1 


ke. 


( 6 ) 


Proof. Let us consider the event F; , a d = Fl U ... U F^. Notice that Tbad may be 
represented by the union of the so-called r-blocks, i.e. 

Ebad= U (fl^ D ^ ( 7 ) 

QC{l,...,fc} q£Q qe{l,...,k}\Q 


where F c stands for the complement of event F. Let us fix now some Q C 
{1,..., k}. Denote 

?Q = f) ^ H T r (8) 

q&Q ije{i,...,fc}\Q 

Notice that P(Fq) < p r (l — p) k ~ r ■ It follows directly from the Bernoulli scheme. 
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Denote X = Xl+ '^' +Xfc . From what we have just said and from the definition 
of {jF \,..., Xk} we conclude that for any given c the following holds: 

m -x\>c)<j2 (JVa - ( 9 ) 

r=ck ' 

Notice also that from the assumptions of the lemma we trivially get: E(Y) = 
E(X). 

Let us consider now the expression P(|Y — E(Y)\) > a. 

We get: ¥(\Y-E(Y)\ > a) = P(|F-F;(X)| > a) = F(\Y - X+ X - E(X)\ > 
a) <F(\Y-X\ + \X-E{X)\ > a) < P(|F - X\ > f) + P(|X - E(X)\ > §). 
From [5] we get: 


P(|F-X|>|)< l~n) k ~ r . 


Let us consider now the expression: 


( 10 ) 




\k—r 


(ii) 


We have: 


f < E E (y-'(i-M) 


k—r 


( 12 ) 


From the Stirling’s formula we get: r! = 
k 


-(1 + o r (l)). Thus we obtain: 


1 , fee. 


«<(!+»,.(D) E r -~nY ^(y)V(i-rt‘- r (13) 


for r large enough. 

Now we will use the following version of standard Azuma’s inequality: 

Lemma 3. Let W±, ..., Wk be k independent random variables such that E(W \) = 
... E(Wk ) = 0. Assume that —a* < Wi+\ — Wi < pi for i = 2,..., k — 1. Then the 
following is true: 

k 2a 2 

P(| Wi \ > «) ^ 2e”^=i ( “-+^ 2 

j=i 

Now, using Lemma[3]for Wi = Xi — E{XPj and on = E(Xi), Pi = 1 — E(Xi) 
we obtain: 

P{\X-EX\ > |) < 2e"^. 

Combining [13] and [TT] we obtain the statement of the lemma. 


( 14 ) 
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Our next lemma explains the role the Hadamard matrix plays in the entire 
extended S'-regular hashing mechanism. 

Lemma 4. Let n denote data dimensionality and let f(n) be an arbitrary pos¬ 
itive function. Let D be the set of all L 2 ~normalized datapoints, where no two 
datapoints are identical. Assume that \D\ = N. Consider the hyperplanes 
H pr spanned by pairs of different vectors {p, r} from D. Then after applying 
linear transformation TCIZ each hyperplane H p r is transformed into another hy¬ 
perplane . Furthermore, the probability V-unlbat for every H'jfff there exist 
two orthonormal vectors x = (x\,...,x n ),y = ( yi,...,y n ) in such that: 

\xi\, \yi\ < satisfies: 


'Pun >1 — 4 



/ 2 (n) 

e 2 


Proof. We have already noticed in the proof of Lemma [T] that TTR. is a matrix 
of the rotation transformation. Thus, as an isometry, it clearly transforms each 
2-dimensional hyperplane into another 2-dimensional hyperplane. For every pair 
{p,r} let us consider an arbitrary fixed orthonormal pair {u, u} spanning H p>r . 
Denote u = (ui, ...,u n ). Let us denote by vector obtained from u after 

applying transformation TTR,. Notice that the j th coordinate of u Hn is of the 
form: 

uf n = UiTi + ... + u n T n , (15) 

where Tf, ...,T n are independent random variables satisfying: 


Ti = 


jn W 'P 5> 

— 7 = otherwise. 

yjn 


(16) 


The latter comes straightforwardly from the form of the ^-normalized Hadamard 
matrix (i.e a Hadamard matrix, where each row and column is .^-normalized). 

But then, from Lemma [31 and the fact that ||«||2 = 1, we get for any a > 0: 


- ^ - a 2 

P(|uiT! + ... + u n T n I > a) < 2e <2e~~. (17) 

Similar analysis is correct for v Hn . Notice that is orthogonal to u niz 
since v and u are orthogonal. Furthermore, both and u H ^' are /^-normalized. 
Thus {u Hn , 0 ^^} is an orthonormal pair. 

To complete the proof, it suffices to take a = f(n) and apply the union bound 
over all vectors u nn , v nK for all hyperplanes. 

From the lemma above we see that applying Hadamard matrix enables us 
to assume with high probability that for every hyperplane H p r there exists an 
orthonormal basis consisting of vectors with elements of absolute values at most 
We call this event £f. Notice that whether £f holds or not is determined 
only by 71,71 and the initial dataset D. 

Let us proceed with the proof of Theorem |T| Let us assume that event £f 
holds. Without loss of generality we may assume that we have the short !F-regular 
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hashing mechanism with an extra property that every H p r has an orthonormal 
basis consisting of vectors with elements of absolute value at most Fix 

two vectors p, r from the dataset D. Denote by {x,y} the orthonormal basis 
of H pr with the above property. Let us fix the ith row of V and denote it as 
(pi.i , ...,pi :n ). After being multiplied by the diagonal matrix V we obtain another 
vector: 

w = (Vipd!, ...,Vi, n d n ), (18) 

where: 


fd i 0 ••• 0\ 
0 d a --- 0 


V 0 0 • • • d n ) 


(19) 


We have already noticed that in the proof of Lemma[T|that it is the projection 
of w into H p r that determines whether the value of the associated random 
variable A'; is 0 or 1. To be more specific, we showed that Aj = 1 iff the projection 
is in the region U p , r . Let us write down the coordinates of the projection of w 
into H pr in the {x, y}-coordinate system. The coordinates are the dot-products 
of w with x and y respectively thus in the {:r, j/j-coordinate system we can write 
w as: 

U^{x,y} — (fPi,\d\X \, ..., 'Pi 1 ndnXn j 'Pi,! d\P \, ••., 'Pi^ndn'yn) ■ ( 20 ) 


Notice that both coordinates are gaussian random variables and they are 
independent since they were constructed by projecting a gaussian vector into two 
orthogonal vectors. Now notice that from our assumption about the structure 
of V we can conclude that both coordinates may be represented as sums of 
weighted gaussian random variables gi for i = 1, ...,f, i.e.: 


w {x,y} = (fflSi.l + + gtSi,ti giVi,i + + gtVi,t), (21) 


where each v.- h j is of the form d z x z or d z y z for some z that depends only 
on i,j. Notice also that 

s i, i + ••• + s i,t = v i, l + + v i,t■ (22) 

The latter inequality comes from the fact that, by [HHl both coordinates of W{ x ,y} 
have the same distribution. 

Let us denote St = (sjp,Sj,t), Vi = (u^i,..., Vi t t) for i = 1 We need 
the following lemma stating that with high probability vectors si,..., Sk, v \,..., Vk 
are close to be pairwise orthogonal. 


Lemma 5. Let us assume that £f holds. Let f{ri) be an arbitrary positive func- 

tion. Then for every a > 0 with probability at least P SMC0 > 1 — 4( 2 )e , 

taken under coin tosses used to construct T>, the following is true for every 


1 < *i 7 ^ *2 < k: 


yi s *i.* 


Vi 1<u \ < ax{V) + T 


/ 2 H 


14=1 


n 
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Tl «2 / \ 

I V Si uu s i2 , u \ < ax{V) + P -——, 

n 

u —1 

^ f 2 / \ 

I y'ui 1 ,««* 2 ,u| < axCP) + ^ ; ——, 

z —' n 

u=l 

n f2f\ 

I ^ Sj liU u i2iM | < a\(V) + ——. 

' n 

u= 1 

Proof. Notice that the we get the first inequality for free from the fact that 
x is orthogonal to y (in other words, Y^Z=i s ii,u v ii,u can be represented as 
C x iVi and the latter expression is clearly 0). Let us consider now one 

of the three remaining expressions. Notice that they can be rewritten as: 

n 

E = ^ ^ (23) 

2=1 


n 


i= 1 

(24) 

n 

^ dp(i)d\(i) X t(i)yi(i) 

i= 1 

(25) 


for some p, A, f, 7 . Notice also that from the if’-regularity condition we immedi¬ 
ately obtain that p(i) = A (i) for at most P elements of each sum. Get rid of 
these elements from each sum and consider the remaining ones. From the defi¬ 
nition of the 'P-chromatic number, those remaining ones can be partitioned into 
at most x('P) parts, each consisting of elements that are independent random 
variables (since in the corresponding graph there are no edges between them). 
Thus, for the sum corresponding to each part one can apply Lemma [3] Thus 
one can conclude that the sum differs from its expectation (which clearly is zero 
since E(didj) = 0 for * ^ j) by a with probability at most 


Pa < 2 e 


= 1 


(26) 


P a < 2e (27) 

or 

_ 2a 2 

Pa < 2 e E S=i y-rw (28) 

Now it is time to use the fact that event £f holds. Then we know that: 
|xj|, |j/*| < for i = 1 Substituting this upper bound for |ar*|, |s/*| in the 

derived expressions on the probabilities coming from Lemma [3l and then taking 
the union bound, we complete the proof. 
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We can finish the proof of Theorem [T] From Lemma O we see that 
si,Sfc, Vi ,..., Vk are close to pairwise orthogonal with high probability. Let us 
fix some positive function f(n) > 0 and some a > 0. Denote 

(Tl') 

A = a X (V) + * J —^-. (29) 

Notice that , by Lemma [5] we see that applying Gram-Schmidt process we 
can obtain a system of pairwise orthogonal vectors si,Sfc, fii ,Vk such that 

\\vi - Vi\\ 2 < kA. (30) 

and 

\\si - Si \\ 2 < kA. (31) 

Let us consider again w x>y . Replacing .sy by Si and Vi by Vi in the formula 
on w XtV , we obtain another gaussian vector: w x , y for each row i of the matrix V. 
Notice however that vectors w x ,y have one crucial advantage over vectors w XtV , 
namely they are independent. That comes from the fact that Si,..., Sk,v \,..., Vk 
are pairwise orthogonal. Notice also that from [36] and [37] we obtain that the 
angular distance between w x ^ y and w x , y is at most kA. 

Let Zi for i = 1, ...k be an indicator random variable that is zero if w x>y is 
inside the region IA V ^ and zero otherwise. Let Ui for i = 1, ...k be an indicator 
random variable that is zero if w x , y is inside the region U P}T and zero otherwise. 
Notice that 6 pr = m Furthermore, random variables Zi ,..., Z^, Ui ,..., Uk 

satisfy the assumptions of Lemma [5] with /x < where e = kA. Indeed, random 
variables Zi are independent since vectors w XiV are independent. From what we 
have said so far we know that each of them takes value one with probability 
exactly jj. Furthermore Z, ^ Ui only if w XiV is inside U p ^ r and w x ,y is outside 
U Pt r or vice versa. The latter event implies (thus it is included in the event) that 
w X}V is near the border of the region U p , r , namely within an angular distance | 
from one of the four semilines defining lA p _ r . Thus in particular an event Zi ^ U t 
is contained in the event of probability at most 2 • 4 • | that depends only on one 

^ x,y • 

But then we can apply Lemma [2] All we need is to assume that the premises 
of Lemma[5]are satisfied. But this is the case with probability specified in Lemma 
Q] and this probability is taken under random coin tosses used to product TL and 
1Z, thus independently from the random coin tosses used to produce V. Putting 
it all together we obtain the statement of Theorem [T| 

3.2 Proof of Theorem [3] 

We will borrow some notation from the proof of Theorem[L] Notice however that 
in this setting no preprocessing with the use of matrices TL and 1Z is applied. 

Lemma 6. Define U\,...,Uk as in the proof of Theorem[J\ Assume that the 
following is true: 

n 

| ^ ^ ,u^i± ,u | — 

u= 1 
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| ^ ^ ,U$i2 ,u I — 

u= 1 
n 

| ^ ^ ,n^22 I — ^5 

14=1 

n 

| ^ ^ ^2i ,U^l2 ,14 | ^ 

14=1 

for some 0 < A < 1. The the following is true for every fixed 1 < i < j < k: 

\nUiUj = 1) - P(C/i = l)P(C^ = 1)1 = 0(A). 

The lemma follows from the exactly the same analysis that was done in the 
last section of the proof of Theorem[l]thus we leave it to the reader as an exercise. 
Notice that we have: 


Var(0 ") = Var{ 


Ul + - h +Uk ) = T($2 Var(Ui) + J2 CoviUuUj)). ( 32 ) 


i= 1 


i¥=j 


Since Ui is an indicator random variable that takes value one with probability 
7 f, we get: 


Var(Ui) = E{U.f) - E(U t ) 2 = ^(1 - ^). 

(33) 

Thus we have: 


v«'K r ) = { ej ^A + 

(34) 

Notice however that Cov(Ui , Uj) is exactly: ¥(UiUj = 1) — P(t/j 

!)■ 

Therefore, using Lemma [6l we obtain: 

= mu, = 

Var(0;, r ) = l e TL^l + O (A). 

(35) 

It suffices to estimate parameter A. We proceed as in the previous proof. We 
only need to be a little bit more cautious since the condition: \xi\, \yi\ < £j=- 
cannot be assumed right now. We select two rows: ii,i 2 of V . Notice that , again 
we see that applying Gram-Schmidt process we can obtain a system of pairwise 
orthogonal vectors s ^ , s^ , Vi 2 such that 

IIWij - v i2 1| 2 < A. 

(36) 

and 

ll^ii Si 2 ||2 ^ A. 

(37) 
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The fact that right now the above upper bounda are not multiplied by k, 
as it was the case in the previous proof, plays key role in obtaining nontrivial 
concentration results even when no Hadamard mechanism is applied. 

We consider the related sums: 

■®1 X))i=l •££(*)'*'7(i)>-®2 XZi=l ^p(i)^A(i)2/f(i)2/7(i)> 

e 3 = XX -idp(i)d\U)X£({\y^t{\ as before. We can again partition each sum into 
at most x('P) subchunks, where this time xi'P) < 3 (since V is Toeplitz). The 
problem is that applying Lemma[3J we get bounds that depend on the expressions 
of the form 

n 


and 


2 2 

2 ^ x o x j+i 

i=i 

(38) 

n 

3= 1 

(39) 


where indices are added modulo n and this time we cannot assume that all 
\xi\, \yi\ are small. Fortunately we have: 


i =1 


. OL x 


= 1 


(40) 


and 


2=1 


oil 


= 1 


(41) 


Let us fix some positive function f(k). We can conclude that the number of 
variables a X) i such that a X) i > is at most Notice that each such a x ^ 

and each such a y ^ corresponds to a pair {i 1,2 } of rows of the matrix V and 
consequently to the unique element CoviU ^, t/ ? ; 2 ) of the entire covariance sum 
(scaled by p). Since trivially we have \Cov {11^,11^)] = 0(1), we conclude that 
the contribution of these elements to the entire covariance sum is of order ypy. 

f(k) 

Let us now consider these a x ^ and a Vt i that are at most yxy- These sums are 

I2J 

small (if we take f(k) = o(k 2 )) and thus it makes sense to apply Lemma [3] to 
them. That gives us upper bound a = A with probability: 


> 1 _ e ~ n ( a2 7W>\ 


Taking f{k) = ( lo g( fc ) ) ^ and a = A = yyp, we conclude that: 




k 2 


(42) 


(43) 


Thus, from the Cliebysliev’s inequality, we get the following for every c > 0 and 
fixed points p , r: 


Pi r 


n 




(44) 
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That completes the proof. 



